Goto

Collaborating Authors

 human overseer


Evaluating Language-Model Agents on Realistic Autonomous Tasks

Kinniment, Megan, Sato, Lucas Jun Koba, Du, Haoxing, Goodrich, Brian, Hasin, Max, Chan, Lawrence, Miles, Luke Harold, Lin, Tao R., Wijk, Hjalmar, Burget, Joel, Ho, Aaron, Barnes, Elizabeth, Christiano, Paul

arXiv.org Artificial Intelligence

In this report, we explore the ability of language model agents to acquire resources, create copies of themselves, and adapt to novel challenges they encounter in the wild. We refer to this cluster of capabilities as "autonomous replication and adaptation" or ARA. We believe that systems capable of ARA could have wide-reaching and hard-to-anticipate consequences, and that measuring and forecasting ARA may be useful for informing measures around security, monitoring, and alignment. Additionally, once a system is capable of ARA, placing bounds on a system's capabilities may become significantly more difficult. We construct four simple example agents that combine language models with tools that allow them to take actions in the world. We then evaluate these agents on 12 tasks relevant to ARA. We find that these language model agents can only complete the easiest tasks from this list, although they make some progress on the more challenging tasks. Unfortunately, these evaluations are not adequate to rule out the possibility that near-future agents will be capable of ARA. In particular, we do not think that these evaluations provide good assurance that the ``next generation'' of language models (e.g. 100x effective compute scaleup on existing models) will not yield agents capable of ARA, unless intermediate evaluations are performed during pretraining. Relatedly, we expect that fine-tuning of the existing models could produce substantially more competent agents, even if the fine-tuning is not directly targeted at ARA.


We Can All Learn a Thing or Two From the Dutch AI Tax Scandal

#artificialintelligence

I've made the argument before that technology has a lot to offer the field of tax. But for all the potential that technologies like artificial intelligence and machine learning algorithms have to streamline processes and inform policy, they remain tools--tools that must be wielded by real human beings (at least for now) at the direction of human policymakers (at least for now). The Kinderopvangtoeslagaffaire is a tax and political scandal currently rocking the Netherlands--literally translated to "childcare allowance affair." Despite complex political underpinnings, the core elements of the scandal are straightforward. In 2013, the Dutch government deployed artificial intelligence to handle childcare benefits applications and, as you might guess, it did not go well.


Rethinking the artificial intelligence race

#artificialintelligence

Artificial intelligence (AI) has become a buzzword in technology in both civilian and military contexts. With interest comes a radical increase in extravagant promises, wild speculation, and over-the-top fantasies, coupled with funding to attempt to make them all possible. In spite of this fervor, AI technology must overcome several hurdles: it is costly, susceptible to data poisoning and bad design, difficult for humans to understand, and tailored for specific problems. No amount of money has eradicated these challenges, yet companies and governments have plunged headlong into developing and adopting AI wherever possible. This has bred a desire to determine who is "ahead" in the AI "race," often by examining who is deploying or planning to deploy an AI system.


WATCH THIS: A group of autonomous military robots navigate through an underground power plant

Daily Mail - Science & tech

Scientists convened on an unfinished underground power plant in Elma, Washington to test a group of autonomous military robots in a simulated disaster scenario. The scientists weren't taking part in an experiment but a competition sponsored by the Defense Advanced Research Projects Agency (DARPA), as part of its efforts to develop a range of autonomous robots to fill a variety of military roles. The winning team came from NASA's Jet Propulsion Laboratory, a 60 person crew that oversaw a group of 12 robots they'd programmed through an initiative called Collaborative SubTerranean Autonomous Robots (CoSTAR). 'The goal is to develop software for our robots that lets them decide how to proceed as they face new surprises,' JPL's Ali Agha said. 'These robots are highly autonomous and for the most part make decisions without human intervention.' CoSTAR's robots autonomously explored the underground plant, which had been designed to simulate an urban disaster environment with a carbon dioxide leak and warm air vent.


Robots, including one by Sony, are coming for fund management jobs

The Japan Times

Remember Aibo, the computerized dog Sony Corp. started selling in 1999 as the first personal robot? Hiro Mizuno, the chief investment officer of the Government Pension Investment Fund, does. So he asked Sony's computer science lab unit to build him a cyberhound using artificial intelligence to help oversee the external fund managers who manage GPIF's ¥175 trillion ($1.6 trillion) in assets. If the training program succeeds, the software watchdog could catch investors who are straying from their comfort zones, help screen potential portfolio managers based on their previous track records, and even distinguish between luck and skill in generating returns. The project, which Mizuno says is part of his experiments in improving the way money is managed, will run through March, but the Sony team recently issued an interim report.


Process & Philosophy Behind 'Training Intelligent Machines' -MobileCoderz

#artificialintelligence

To address the complexities that arose while solving large & complicated scenarios, this manual training process fell short in potential. But with the advent of AI & ML, the complete cyberspace has taken a giant leap forward and the process of training these Algorithmic bots has been automated. They don't rely on humans to supervise or train them. Once their artificial neural structure (ANN) achieves a certain level of maturity, they start learning on their own when exposed to different training sets/ data sets. Meanwhile, the effort that goes into training a bot in its nascent stage can't just be denied.